STOCHASTIC ALGORITHMS FOR COMPUTING MEANS OF 
PROBABILITY MEASURES 



MARC ARNAUDON, CLEMENT DOMBRY, ANTHONY PHAN, AND LE YANG 

Abstract. Consider a probability measure fi supported by a regular geodesic 
ball in a manifold. For any p > 1 we define a stochastic algorithm which con- 
verges almost surely to the p-mean e p of fi. Assuming furthermore that the 
functional to minimize is regular around e p , we prove that a natural renormal- 
ization of the inhomogencous Markov chain converges in law into an inhomo- 
geneous diffusion process. We give an explicit expression of this process, as 
well as its local characteristic. 
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1. Introduction 

The geometric barycenter of a set of points is the point which minimizes the sum 
of the distances at the power 2 to these points. It is the most common estimator 
is statistics, however it is sensitive to outliers, and it is natural to replace power 2 
by p for some p £ [1,2), which leads to the definition of p-mean. When p = 1, the 
minimizer is the median of the set of points, very often used in robust statistics. In 
many applications, p-means with some p € (1, 2) give the best compromise. 

The Fermat- Weber problem concerns finding the median ei of a set of points in 
an Euclidean space. Numerous authors worked out algorithms for computing e\. 
The first algorithm was proposed by Weiszfeld in [3T]. It has been extended to 
sufficiently small domains in Riemannian manifolds with nonnegative curvature by 
Fletcher and al in [7] . A complete generalization to manifolds with positive or neg- 
ative curvature, including existence and uniqueness results (under some convexity 
conditions in positive curvature), has been given by one of the authors in |22j . 

The Riemannian barycenter or Karcher mean of a set of points in a manifold or 
more generally of a probability measure has been extensively studied, see e.g. [8], 

l 
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[ID] . [TT| . [SJ, [18], [2], where questions of existence, uniqueness, stability, relation 
with martingales in manifolds, behaviour when measures are pushed by stochastic 
flows have been considered. The Riemannian barycenter corresponds to p = 2 in 
the above description. Computation of Riemannian barycenters by gradient descent 
has been performed by Le in [13] . 

In p] Afsari proved existence and uniqueness of p-means, p > 1 on geodesic 



balls with radius r < — min < inj(M), — > if p € [1, 2), and r < — min < inj(M), — 



if p > 2. Here inj(M) is the injectivity radius of M and a > is such that the 
sectional curvatures in M are bounded above by a 2 . The point is that in the case 
p > 2, the functional to minimize is not convex any more, which makes the situation 
much more difficult to handle. 

In this paper, under the assumptions of [I] we provide in Theorem 12 .31 stochastic 
algorithms which converge almost surely to p-means in manifolds, which are easier 
to implement than gradient descent algorithm since computing the gradient of the 
function to minimize is not needed. The idea is at each step to go in the direction of 
a point of the support of fi. The point is chosen at random according to fj, and the 
size of the step is a well chosen function of the distance to the point, p and the num- 
ber of the step. For general convergence results on recursive stochastic algorithms, 
see [14j Theorem 1. However they do not cover the manifold case and nonlinear- 
ity of geodesies. Here we give a proof using martingale convergence theorem, and 
the main point consists in determining and estimating all the geometric quanti- 
ties, checking that under our curvature conditions all the convergence assumptions 
are fulfilled, since our processes live in manifolds. See also [3] for convergence in 
probability of recursive algorithms. 

The speed of convergence is studied, and in theorem 12.61 we prove that the 
renormalized inhomogeneous Markov chain of Theorem 12.31 converges in law to an 
inhomogeneous diffusion process. This is an invariance principle type result, see 
e.g. [5], [IS], [1], [5J for related works. Here again the main point is to obtain the 
characteristics of the limiting process from the curvature conditions, the conditions 
on the support of the mcsurc and estimates on Jacobi fields. Moreover we consider 
convergence in law for the Skorohod topology, and the limit depends in a crucial 
way on the decreasing steps of the algorithms. 



2.1. p-means in regular geodesic balls. Let M be a Riemannian manifold with 
pinched sectional curvatures. Let a, /3 > such that a 2 is a positive upper bound 
for sectional curvatures on M, and — (3 2 is a negative lower bound for sectional 
curvatures on M. Denote by p the Riemannian distance on M. 

In M consider a geodesic ball B(a, r) with a £ M. Let n be a probability measure 
with support included in a compact convex subset of B(a,r). Fix p £ [l,oo). 
We will always make the following assumptions on (r,p,/i): 

Assumption 2.1. The support of fi is not reduced to one point. Either p > 1 or 
the support of \x is not contained in a line, and the radius r satisfies 
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2. Results 



(2.1) 




Note that B(a, r) is convex if r < \ min 



{inj(M),£}. 
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Under assumption l2.il it has been proved in [T] (Theorem 2.1) that the function 
(2.2) 



H p : M -> R+ 



x^f I p p (x,y)p(dy) 

JM 

has a unique minimizer e p in M, the p-mean of p, and moreover e p G B(a,r). If 
p = 1, ei is the median of p. 

It is easily checked that if p G [1, 2), then H p is strictly convex on B(a, r). On 
the other hand, if p > 2 then iJ p is of class C 2 on B(a,r). 

Proposition 2.2. Lei K be a convex subset of B(a,r) containing the support of p,. 
Then there exists C Py ^ y K > such that for all x € K , 

(2.3) flpOr) - H p (e p ) > ^f^p(x, e p f . 
Moreover if p > 2 then we can choose C p „ t K so that for all x £ K , 

(2.4) || grad, H p \\ 2 > C P ^ K (H p (x) - H p (e p )) . 
In the sequel, we fix 

(2.5) K = B(a,r-s) with £= P( K ^rf) 

We now state our main result: we define a stochastic gradient algorithm (X k ) k >o 
to approximate the p-mean e p and prove its convergence. 

Theorem 2.3. Let (P k ) k >i be a sequence of independent B(a,r)-valued random 
variables, with law p. Let (t k ) k >\ be a sequence of positive numbers satisfying 

1 piK^B^rfy 



(2 ' 6) ^ tk£mhl \C P ,^ 2 P (2r)^ 

oo oo 

(2.7) tk = +oo and t\ < oo. 
fc=i fc=i 

Letting xq G X, define inductively the random walk (X k )k>o by 

(2.8) X = £ for k>0 X k+1 = exp Xfc (-tfe+i g rad x fc P P (-- Pc+i)) 

where F p (x,y) = p p (x,y), with the convention gradj, F p (-, x) = 0. 
The random walk (X k ) k >i converges in L 2 and almost surely to e p . 

In the following example, we focus on the case M = R d and p = 2 where drastic 
simplifications occur. 

Example 2.4. In the case when M = M. d and p is a compactly supported proba- 
bility measure on M. d , the stochastic gradient algorithm (|2.8[) simplifies into 

X = x and for k > = X fc - t fe+1 grad Xfc F p (-, Pfc+i). 

If furthermore p = 2, clearly e2 = E[Pi] and grad^. F p (-,y) = 2(x — y), so that the 
linear relation 

X k+1 = (l-2t k+1 )X k + 2t k+1 P k+1 , k>0 
holds true and an easy induction proves that 

k-l k-l j-i 

(2.9) x k = xvY[{i-2t k -y + 2 s jrPk-yk- ] n(i-2t k - i ), k>l. 

j=o j=o e=o 



so that 
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Now, taking £fc = — , we have 

fe-i j-i , _ . 

JJ(l-2t fc _j) = and JJ(l-2tfc_*) = -£-£ 

3=0 ^=0 

The stochastic gradient algorithm estimating the mean ei of [i is given by the em- 
pirical mean of a growing sample of independent random variables with distribution 
/i. In this simple case, the result of Theorem 12.31 is nothing but the strong law of 
large numbers. Moreover, fluctuations around the mean are given by the central 
limit theorem and Donsker's theorem. 

2.2. Fluctuations of the stochastic gradient algorithm. The notations are 
the same as in the beginning of section |2~T1 We still make assumption 12.11 Let us 
define K and e as in (12.51) and let 

We consider the time inhomogeneous M-valued Markov chain (|2.8I) in the par- 
ticular case when 

(2.11) i*=min(l*iY k>l 



for some S > 0. The particular sequence (tk)k>i defined by (|2.11[) satisfies (12. 6[) 
and (|2.7|) . so Theorem [531 holds true and the stochastic gradient algorithm (Xk)k>o 
converges a.s. and in L 2 to the p-mean e p . 

In order to study the fluctuations around the p-mean e p , we define for n > 1 the 
rescaled T 6p M-valued Markov chain (Y k n )k>o by 

(2.12) Y k n = Aexp-i^. 

We will prove convergence of the sequence of process (Y£t])t>a to a non-homogeneous 
diffusion process. The limit process is defined in the following proposition: 



Proposition 2.5. Assume that H p is C 2 in a neighborhood of e p , and that 5 > 

i-i 



c v!k- De fi ne 



r = e 

and Gg{t) the generator 



grad. F p (; Pi) <g> grad ep F p (; P) 



(2.13) G s (t)f(y) :=(d y /,i- 1 (t/-(5VdP p (y,-) S )> + yHess y /(r) 

where \7dH p (y,-)^ denotes the dual vector of the linear form X7dH p (y,-). 

There exists a unique inhomogeneous diffusion process {ys(t)) t> Q on T Bp M with 
generator Gg(t) and converging in probability to as t — > + . 
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The process ys is continuous, converges a.s. to as t — > + and has the following 
integral representation: 

(2.14) ys (t) = J^t 1 -^' / s sx >- 1 (SadB s ,e t )e l , t > 0, 

where B t is a standard Brownian motion on T ep M, a € End(T ep Af) satisfies 
oo* = r, (ei)i<j<d is art orthonormal basis diagonalizing the symmetric bilin- 
ear form \JdH p (e p ) and (\i)i<i<d are the associated eigenvalues. 

Note that the integral representation (|2.14l) implies that ys is the centered Gauss- 
ian process with covariance 

(2.15) E[yl(tMt 2 )] = ^^ t\- 5 H- 5Xi (tiAt 2 y^-\ 
where y\(t) = {ys(t),ei), 1 < i,j < d and t l7 t 2 > 0. 

Our main result on the fluctuations of the stochastic gradient algorithm is the 
following: 

Theorem 2.6. Assume that either e p does not belong to the support of (i or p > 2. 
Assume furthermore that S > C p ^ lK . The sequence of processes ^[™t]J weakly 
converges in ID>((0, oo), T ep M) to ys- 

Remark 2.7. The assumption on e p implies that H p is of class C 2 in a neigh- 
bourhood of e p . In the case p > 1, in the "generic" situation for applications, /i 
is a discrete measure and e p does not belong to its support. For p — 1 one has to 
be more careful since if /i is equidistributed in a random set of points, then with 
positive probability e\ belongs to the support of [i. 

Remark 2.8. From section I2TT1 we know that, when p € (1,2], the constant 

Cp^ji = p(2r) p ~ 2 (min {p - 1, 2ar cot (2or))) 
is explicit. The constraint 6 > K can easily be checked in this case. 

Remark 2.9. In the case M = R d , Y£ = -^=(X k -e p ) and the tangent space T Ep M 

is identified to M. d . Theorem 12.61 holds and, in particular, when t = 1, we obtain a 
central limit Theorem: y/n(X n — e p ) converges as n — > oo to a centered Gaussian 
ci-variate distribution (with covariance structure given by (|2.15|) with t\ = t% = 1). 
This is a central limit theorem: the fluctuations of the stochastic gradient algorithm 
are of scale n -1 / 2 and asymptotically Gaussian. 



3. Proofs 

For simplicity, let us write shortly e = e p in the proofs. 
3.1. Proof of Proposition [2.21 

For p = 1 this is a direct consequence of [22] Theorem 3.7. 
Next we consider the case p £ (1,2). 

Let K C B(a,r) be a compact convex set containing the support of /i. Let 
x G K\{e}, t — p(e,x), u G T e M the unit vector such that exp e (/o(e, x)u) = x, 
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and 7„ the geodesic with initial speed u : 7«(0) = u. For y € K, letting h v (s) — 
p(j u (s), y) p , s £ [0, t], we have since p > 1 

^(t) = / lj/ (o)+tf l ' j/ (0)+ / (t-s)/i''(s)ds 

Jo 

with the convention hy(s) — when 7 u (s) = y. Indeed, if y ^ 7([0,i]) then h y is 
smooth, and if y € 7([0, i]), say y = 7(so) then h y {s) = \s — sq\ p and the formula 
can easily be checked. 
By standard calculation, 

K(s) 

(3.1) >pp{lu(s),y) p ~ 2 

x f(p-l)||74 S ) Tfe) || 2 + ||74 S ) W(y) l| 2 M7«(s),y)cot(ap( 7lI ( S ),y))" 



with 7 u (s) T ^ (resp. ^(s)^ ^) the tangential (resp. the normal) part of j u (s) 
with respect to n(j u (s),y) = — — — — - exp" 1 , Jy): 

j u (sf(v) = (Us)Mlu(s),y))n( lu (s),y), Us) N{v) = 7«(«) - 7u(s) Tfe) . 
From this we get 

(3.2) ^'(s) > W ( 7tl (s), 2 /) p - 2 (min(p-l,2arcot(2ar))). 

Now 

flp(7«(f)) 

h y (j u (t')) fi(dy) 

B(a,r) 

h y (0)fi(dy)+t' [ h' y (0)p(dy)+ f (t'-s)l[ h y (s)" fi(dy)) ds 

B(a,r) JB(a,r) JQ \J B(a,r) J 

and H p ( r y u (t')) attains its minimum at t' — 0, so / h' y (0) p(dy) = and 

J B(a,r) 



have 



we 

B(a,r) 

H p (x) = fl„(7u(t)) =H p (e)+ f (t-s)[ [ h y (s)" fi(dy)) ds. 



B(a,; 



Using Equation (|3.2[) we get 

(3.3) 

H p (x) > H p {e) 



t 

-2 



+ / (*-«)/ P/9(7«(«), y) P (min (p - 1, 2ar cot (2ar))) //(dy) ds. 

JO \ JB(a,r) J 

Since p < 2 we have p(7«(s), y) p ~ 2 > (2r) p_2 and 

t 2 

(3-4) iJ p (x) > iJ p (e) + —p(2r) p - 2 (min (p - 1, 2ar cot (2ar))) . 

So letting 

Cp^ji = p(2r) p ~ 2 (min(p- 1, 2ar cot (2w))) 
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we obtain 

(3.5) H p (x) > H p (e) + C ™*P& X )\ 

To finish let us consider the case p > 2. 

In the proof of [T] Theorem 2.1, it is shown that e is the only zero of the maps 
x i — ^ grad^ H p and x n- H p (x) — H p {e), and that VdH p {e) is strictly positive. 
This implies that (12.3[) and (|2.4[) hold on some neighbourhood B(e,e) of e. By 
compactness and the fact that H p — H p (e) and gradip, do not vanish on K\B(e, e) 
and H p — H p (e) is bounded, possibly modifying the constant C P)M) k, (|2.3j) and (|2.4j) 
also holds on K\B{e,e). 

□ 



3.2. Proof of Theorem [HH 

Note that, for x ^ y, 

gmd x F(-,y) =pp p -\x,y) , a , = -pff^ix, y)n(x,y), 

whith n(x,y) :— — — — a unit vector. So, with the condition (|2.6p on tk, the 

random walk (Xk)k>o cannot exit K: if Xk G A then there are two possibilities for 
Xk+i'- 

• either Xk+i is in the geodesic between Xk and Pk+i and belongs to K by 
convexity of if; 

• or Xk+i is after Pfe+i, but since 



||t fc +igrad Xfc F p (-,P fc+1 )|| = ^ +lW P- 1 (X fe ,P fc+1 ) 

p{K^B{a,r) c ) , 
~ 2p(2r)f"i W P^+i) 
^ p(jr„,B(a,r) c ) 



p(^,S(a,r) c ) 



we have in this case 

p(P fe+1 ,X fe+1 ) < 

which implies that Xk+\ G A'. 

First consider the case p £ [1,2). 
For fc > let 

:=^P 2 (e,7(*)), 

l{t)te[o,t k+1 ] the geodesic satisfying -y(O) = - grad Xfc F p (-,P k+1 ). We have for all 

te [0,ik+i] 

(3.6) £"'(*) < C((3,r,p) := p 2 (2r) 2p - 1 (3 cotanh(2/3r) 



s 
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(see e.g. [22] ). By Taylor formula, 
p{X k+ i,e) 2 
= 2E(tk+i) 

= 2E(0) + 2t k+1 E'(0) + t 2 k+1 E"(t) for some t e [0, t k+1 ] 

< p(X k ,e) 2 + 2t k+l (grad Xfc F p (-,P k+1 ), exp"* (e)) + t 2 +1 C((3, r,p). 

Now from the convexity of x <— > F p (x, y) we have for all x, y G B(a, r) 

(3.7) F p (e,y)-F p (x,y) > (gracL, F p (; y), exp-^e)) . 

This applied with x = X kl y — P k +i yields 
(3.8) 

p(X k+1 ,e) 2 < p(X k ,e) 2 - 2t k+1 (F p (X k ,P k+1 ) - F p (e,P k+1 )) + C(/3,r,p)t 2 k+1 . 
Letting for k > & k = a(X e , < I < k), we get 
E[p(X k+1 ,e) 2 \,n] 

<p(X k ,e) 2 -2t k+1 f (F p (X k ,y)-F p (e,y)) f M(dy)+C(p,r,p)tl +1 

JB(a,r) 

= P (X kl e) 2 - 2t k+1 (H p (X k ) - H p (e)) + C(/3,r,p)t 2 k+1 
<p{X k: e) 2 + C{^r,p)t 2 k+l 

so that the process (Yfe)fc>o defined by 

k 

(3.9) Y = p(X ,e) 2 andforfc>l Y k = p(X fe , e) 2 - C(/3, r,p) ^ % 

3 = 1 

is a bounded supermartingale. So it converges in L 1 and almost surely. Conse- 
quently p(X k ,e) 2 also converges in L 1 and almost surely. 
Let 

(3.10) a = lim E \p{X k ,e) 2 ] . 

k— >oo 

We want to prove that a = 0. We already proved that 

(3.11) E [p(X k+1 ,e) 2 \J? k ] < p(X kl e) 2 -2t k+1 (H p (X k ) - H p (e)) + C(j3,r,p)t 2 k+1 . 
Taking the expectation and using Proposition 12.21 we obtain 

(3.12) E[p(X k+1 ,e) 2 ] <E[p{X k ,e) 2 ] - t k+1 C p ^ K E [p(X k , e) 2 ] + C(f3,r,p)t 2 k+1 . 

An easy induction proves that for £ > 1, 

e e 

(3.13) E[p(X k+e ,e) 2 ] < - C p ^ K t k+J )E[p(X k ,e) 2 ] + Ci^r^^tl+y 

Letting £ — } oo and using the fact that Sj=i ^fc+j = 00 which implies 

oo 

— Cp,fi,Ktk+j) = 0, 

2=1 
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we get 

oo 

o k+j . 



(3.14) a<C((3,r,p)J2t 2 k 



Finally using Yl'jLi tj < 00 we obtain that linifc^oo J^'jLi tfi+j = 0, so a = 0. This 
proves L 2 and almost sure convergence. 

Next assume that p > 2. 

For k > let 

« H- £^(f) := fli,(7(t)), 

7(*)te[o,u+i] the geodesic satisfying 7(0) = -grad Xfc F p (-,P k+1 ). With a calcula- 
tion similar to (|3.6[) we get for all t £ [0, tfc+i] 

(3.15) ££(t) < 2C(J3,r,p) := y (2r) 3p ~ 4 (2r/3 cotanh(2/3r) + 2p - 4) . 

(see e.g. [H]). By Taylor formula, 



f 2 

2 

+2 



£ P (0) + i fe +i4(0) + -^E p {t) for some i e [0, t k+l ] 



< H p {X k ) +t k+1 (d Xk H p , gmd Xk F p (-,P k+1 ))+t 2 k+1 C{(3,r,p). 



We get 
E[H p (X k+1 )\^ k ] 



< H p (X k )-t k+1 (d Xk H p , [ gmd Xk F p {-,y)fi{dy) \ + C(f3,r,p)t 2 k+1 

\ JB(a,r) I 



= H p (X k ) - t k+1 (d Xk H pi gv & d Xk H p {-)) + C(p,r,p)t 2 k+1 
= H p (X k )-t k+1 ||grad^ H p (-)\\ 2 + C(p,r,p)t 2 k+1 
< H p (X k ) - C p ^ K t k+1 (H p (X k ) - H p (e)) + C{/3,r,p)t 2 k+1 

(by Proposition 12. 2p so that the process {Yk)k>o defined by 
(3.16) 

k 

Y a = H p (X ) - H p (e) and for k >1 Y k = H p (X k ) - H p {e) - C(p, r,p) ^ t] 

i=i 

is a bounded supermartingale. So it converges in L 1 and almost surely. Conse- 
quently H p {X k ) — H p (e) also converges in L 1 and almost surely. 
Let 

(3.17) a = lim E [H p {X k ) - H p (e)] . 

k — > 00 

We want to prove that a = 0. We already proved that 
E[H p {X k+1 ) - H p (e)\,n] 
l M < H p (X k ) - H p (e) - C p ^ K t k+ i (H p (X k ) - H p (e)) + C((3,r,p)t 2 k+1 . 
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Taking the expectation we obtain 
(3.19) 

E [H p {X k+1 ) - H p (e)} < (1 - t k+l C p ^ K )E [H p {X k ) - H p (e)} + C{(3, r,p)t 2 k+1 

so that proving that a — is similar to the previous case. 

Finally (|2.3p proves that p(X k , e) 2 converges in L 1 and almost surely to 0. □ 

3.3. Proof of Proposition [2751 Fix e > 0. Any diffusion process on [e, oo) with 
generator Gs (t) is solution of a sde of the type 

(3.20) dy t = ^L s (y t )dt + S<jdB t 

where L$(y) — y — 5S7dH p (y, •)" and B t and a are as in Proposition 12.51 This sde 
can be solved explicitely on [e, oo). The symmetric endomorphism y i— > WdH p (y, •)" 
is diagonalisable in the orthonormal basis (ej)i<j<d with eigenvalues {Xi)i<i<d- 
The endomorphism Ls = id — SVdH p (e)(id, •)* is also diagonalisable in this basis 

d 

with eigenvalues (1 — 5Ai)i<i<d. The solution y t = ^yje; of (|3.20|) started at 

i=l 

d 

Vs = ^2 Vle-i is given by 



(3.21) y t = Y,{vU SXi ~ l + f s S 5Xi ~H^dB s ,e i ) S jt 1 - 5Xi e i , t>e 



Now by definition of C p ^k we clearly have 

(3.22) Cp u, K < min A*. 

1<2<(/ 

So the condition 8C p .^.k > 1 implies that for all i, SXi — 1 > 0, and as e — >• 0, 

(3.23) J s SK - 1 {5adB s ,e l ) -> s 6 ^ 1 (Sa dB s , ei) in probability. 

Assume that a continuous solution y t converging in probability to as t — > + 
exists. Since y*e' 5Ai_1 — > in probability as s — > 0, we necessarily have using (|3.23|) 

(3.24) y t = VY - *** / a ffA *- 1 <* ( rtiB.,e i )e i) < > 0. 



<<5 2 r(e* K) e*) 

Note vl is Gaussian with variance — - — , so it converges in L 2 to as t — > 0. 

Conversely, it is easy to check that equation (|3.24p defines a solution to (13.201) . 
To prove the a.s. convergence to we use the representation 

[ s sx *- i (6(TdB a ,e i }=B i {t) 
Jo 

5 2 T(e* (£> e*) 

where Bl is a Brownian motion and tpdt) = r-^ i]_j25\i-i_ Then by the law 

2o\i — 1 

of iterated logarithm 

lhnsnpt 1 - SXi B i (t) < lim sup t 1_5A< J 2ip t (t) In In UpT 1 (t)) 
40 40 v 
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But for t small we have 

yVWmln^W) < t SX '- 3/4 

so 

limsupi 1 - ,5A ' J B* m < limi 1/4 = 0. 

This proves a.s. convergence to 0. Continuity is easily checked using the integral 
representation p.24[) . □ 

3.4. Proof of Theorem l2.6l Consider the time homogeneous Markov chain (Zj?)k>o 
with state space [0, oo) x T e M defined by 

(3-25) ZZ=(-,Y k n 

The first component has a deterministic evolution and will be denoted by ij!; it 
satisfies 

(3.26) *Z+i + fc >0- 

n 

Let ko be such that 

(3.27) ^- < Sl 

Using equations (|2.8I) . (|2.12j) and (|2.1ip . we have for k > k , 
(3.28) 

nt n + 1 / (8 \\ 

Ffe " +1 = exp " (, exPe ^ grad v% *? Pk+l) ) ) ■ 

Consider the transition kernel P n (z, dz') on (0, oo) x T e M defined for z = (t, y) 

by 

(3.29) 
P n (z,A) = 

1 nt+1 _i/ f 8 



where A e ^((0, oo) x T e M). Clearly this transition kernel drives the evolution of 
the Markov chain (ZJ})k>k - 

For the sake of clarity, we divide the proof of Theorem 12.61 into four lemmas. 



Lemma 3.1. Assume that either p > 2 or e does not belong to the support supp(/i) 
of (i (note this implies that for all x G supp(/x) the function F p (-,x) is of class 
C 2 in a neighbourhood of e). Fix 5 > 0. Let B be a bounded set in T e M and let 
< e < T . We have for all C 2 function f on T e M 

(3.30) 



// j / ( ^=Iexp e 1 ( ex Pcxpe _x_ y (-_^_grad exp 



y 
t 

s 2 „ , , , „ , „ „ / 1 



*»/>f) ~ Vn(d y f,5grad e F p (-,x)) -SVdF p (-,x) (grad y /, 



■f — Uess y f (grad e F p (-, x) ® grad e F p (-, x)) + O [ — 
liformly in y G B, x G supp(/i), t € [s,T]. 
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Proof. Let x G supp(/i), y G T e M, u,v G K sufficiently close to 0, and g = 
cxp e (^~J- For s G [0, f ] denote by o ^ c(a, s,u,v) the geodesic with endpoints 
c(0, s, w, w) = e and 

c(l,s,u,u) = exp cxpe( -^^ (-vsgrad^^Fp^x)) : 

c(a,s,u,v) = exp e jaexp^ 1 exp^^^-sw g rad cxPe ( ^ ) ^ P (-, z)) }. 

This is a C 2 function of (a, s,m, v) G [0, f] 2 x (—i],n) 2 , n sufficiently small. It also 

depends in a C 2 way of x and y. Letting c(a, s) = c (a, s, — =, J , we have 

V y/n nt + 1 ) 

exp- 1 ^ex PcxPis _j_ y ^-^^-grad exPe _i_ yJ Fp(-,a;)^ =d a c{0,l). 

fit + 1 

So we need a Taylor expansion up to order n -1 of — ;^<9 o c(0, 1). 

y/n 

We have c(a, s, 0, 1) = exp e (—as grad e x)) and this implies 

<9 2 <9 a c(0, s, 0, 1) = 0, so <9 2 <9 Q c(0, s, u, I) = O(u). 

On the other hand the identities c(a, s, it, w) = c(a, sw, n, I) yields d 2 d a c(a, s, u, v) — 
v 2 d 2 d a c(a, s, u, I), so we obtain 

<9 2 <9 a c(0, s,u, u) = 0(uv 2 ) 

and this yields 

d 2 s d a c(0, S ) = O(n- 5 ' 2 ), 
uniformly in s,x,y,t. But since 

HMo,i)-Mo,o)-0.Mo,o)|| < l sup |ja 2 9 aC (o, s )|| 

z «e[o,i] 

we only need to estimate 9 a c(0, 0) and 9 s <9 o c(0, 0). 
Denoting by J (a) the Jacobi field 9 s c(a,0) we have 

nt + 1. .„ , . nt + 1 „ ,„ „. nt+1 • . . ~ / 1 
3 a c(0, 1) = — =- d a c(0, 0) + — =- J(0) + O -j 



V n V n V n \ n 

On the other hand 

, , nt + 1 y y 

— ^3 a c0, = — -= — -=- = y + — 

y/n y/n y/nt nt 

so it remains to estimate J(0). 

The Jacobi field a M> J(a, u, v) with endpoints J(0, u, w) = e and 

J(l,u,t>) = -ugradg/ujA F p (-,x) 

satisfies 

V 2 J(a, w, u) = —R(J(a, u, v), d a c{a, 0, n, v))d a c{a, 0, n, u) = 0(u 2 v). 
This implies that 

V 2 J(a)-0(n- 2 ). 

Consequently, denoting by P Xl ,x 2 '■ T Xl M — > T X2 M the parallel transport along the 
minimal geodesic from xi to x 2 (whenever it is unique) we have 

(3.31) P c(1 ,o) >e J(l) - J(0) + J(0) + 0(n- 2 ) = j(0) + 0( n - 2 ). 
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But we also have 

Pc(l,0,u,v),eJ{l,%v) = P c (l,o,«,«),e (~ u g ra d c (i,o,u,u) F p (' > &)) 

= -i;grad e F p (-,a;) - !)Va oC (o,o,«,») grad. F p (-, x) + 0(vu 2 ) 

= -vgrad e F p (-,x)-vVdF p (-,x) (^-, + 0(vu 2 ) 

where we used d a c(0, 0, u, v) = ^f- and for vector fields A, B on TM and a C 2 
function on M 

(V Ac grad/i,B e ) = A e (grad/ 1 , J B e ) - (grad f u V Ae B) 
= A e {dh,B e )-{dh,V Ae B) 
= Vdh(A e ,B e ) 

which implies 

V Ac gradA = Vdh{A ei -)K 

We obtain 

PW(D - -^grad.^,*) _ —^—V^C^) (f ,•) W"*). 
Combining with (|3.3ip this gives 

So finally 
(3.32) 

! ^3 Q c(0, 1) = V + -| - * grad e F p (-,x) - 6VdF p (-,x) (iL, •)* + O (n" 3 / 2 ) . 

To get the final result we are left to make a Taylor expansion of / up to order 2. □ 
Define the following quantities: 

(3.33) b n {z) =n I {z 1 - z)P n (z, dz 1 ) 

J{\z>-z\<l} 

and 

(3.34) a n (z) =n (z' -z)® (z' - z)P n (z, dz'). 

J{\z'-z\<l} 

The following property holds: 

Lemma 3.2. Assume that either p > 2 or e does not belong to the support supp(/i). 

(1) For all R > and e > 0, there exists n$ such that for all n > no and 
z G [e,T] x B(0 e ,R), where B(0 e ,R) is the open ball in T e M centered at 
the origin with radius R, 

(3-35) J l {lz ,_ zl>1} P n (z,dz / ) = 0. 

(2) For all R>{) and e > 0, 

(3.36) lim sup \b n (z) - b{z)\ = 

n ^°° ze[e,T]xB(O c ,R) 
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with 



(3.37) 



b(z) = (l , ]Ls(y)\ and L 5 (y) =y- 5VdH{y, •)». 



(3) For all R>0 and e > 0, 



(3.38) 



lim sup \a n (z) — a(z)| = 

n^oo ze [ £tT ]xB(Q e ,R) 



with 

(3.39) a(z) = <5 2 diag(0, T) and r = E [grad e F p (-, P x ) ® grad e F p (-, Pi)] . 
Proof. (1) We use the notation z = (t,y) and z' = (t',y r ). We have 

l{|*'-*|>i}P n (2,<fe0 

l{max(|t'-t|,|y'- 2 /|)>l}-P™(^,^') 



> 1 



— / l{max(i,|y'-i/|)>l}-P ,l ( 2: j 



rct+1 , 

exp e l exp cxPo . y 



\ nt + i gradcx Pe -Mv J " 



p p (-,Pi) -y 



On the other hand, since F p (-,x) is of class C 2 in a neighbourhood of e, we 
have by |L32j) 

(3.40) 
ni + 1 



• exp e exp. 



v^t y V nt + 



Tg r ad exPe i j,-Fp(-,Pi) 



< 



C5 



me 



for some constant C > 0. 
(2) Equation (|3 .35[) implies that for n> n 

b n (z) 

= n [ (z' - z)P n (z,dz') 
~nt + 1 



(>x l>, l<xPe XPe -JL I - nt+1 grad exPe _^_F p (-,Pi, 



We have by lemma 13.11 
rai + 1 



/n ' exPe ^ exp <=xp e ^ 



T grad cxPc _ fey P p (-,P 1 )^ -y) 



1 



= -j/-^grad e P P 0,Pi)-5V(iF p (-,Pi) i-y 

a.s. uniformly in n, and since 

E[<V^grad e F p (-,Pi)] = 0, 

this implies that 



O 



,1/2 



n E 



nt + 1 _•, 
— t=- exp e I exp. 



-Ygrad oxPc _i_j / Pp(-,Pi 
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converges to 

/i \ n i 

= -y - SVdH p -y, 



6VdF p {;I\) [ \y, 



(3.41) ly-E 

Moreover the convergence is uniform in z e [e,T] x B(O e ,R), so this 
yields (pHo) . 
(3) In the same way, using lemma [3TT1 

{y'-y)® {y' -y)P n {z,dz') 



= —E [(-^<5grad e F p (; Pi)) <g> (-^<5grad e F p (-, P x ))] + o(l) 
= <5 2 E [grad e F p (., Pi) ® grad e P p (-, Pi)] + o(l) 
uniformly in z G [e,T] x B(O e ,R), so this yields ([335)) . 



□ 



<5 .. , „ . _i 



Lemma 3.3. Suppose that t n — — for some S > 0. For all S > C p M K , 

(3.42) supnE [p 2 {e,X n )] < oo. 

Proof. First consider the case p£ [1,2). 

We know by (|3.12l) that there exists some constant C(f3,r,p) such that 

(3.43) E [p 2 (e,X k+1 )] < E [p 2 (e,X k )] exp (-C Ptti , K t k+1 ) + C(0, r,p)t 2 +1 . 

From this (|3.42l) is a consequence of Lemma 0.0.1 (case a > 1) in [16]. We give the 
proof for completeness. We deduce easily by induction that for all k > ko, 

(3.44) 

E[p 2 (e,X k )] 

(k \ k / k 

-C p ^k tA+C{^r,p) £ tfexp -C PlMlJ c £ t 4 

j=fco+l / i=fe +l \ J=i+1 

where the convention Ylj=k+i = is used. With t n — the following inequality 
holds for all i > ko and k > i: 

(3.45) y U = S Y - > S [ k+1 - > Shx — . 
Hence, 

E[p 2 (e,X k )] 

(3.46) /fc + iy C "^ ^ 2 C(/3,r,p) * 

For 5C Ptflt K > 1 we have as k — > oo 
(3.47) 

5 2 C(f3,r,p) (t + 1)*°™.* <S 2 C(/3,r,p) fc^^K-i 6 2 C(P,r,p) x 

(fc + i)«w JL i j2 (fc + i)«w,* 5CpiMiJC _ i ~ <SC p , m ,k - 1 
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and 

E[p 2 (e,X fe0 )] (|±±J = (fc" 1 ). 

This implies that the sequence fcE [p 2 (e, Xf.)] is bounded. 

Next consider the case p > 2. 

Now we have by (13. 19)) that 
(3.48) 

E [H p (X k+1 ) - H p (e)} < E [H p (X k ) - if p (e)] exp (-C p ^ K t k+1 ) + C{fi, r,p)t 2 +1 . 

From this, arguing similarly, we obtain that the sequence fcE [H p {X k ) — H p (e)] is 
bounded. We conclude with (12.31). □ 



Lemma 3.4. Assume 5 > C p K and that H p is C 2 in a neighbourhood of e. For 
all < e < T, the sequence of processes {Y^t]j * s tight in D([e, T], M. d ). 



Proof. Denote by ( YJ 1 = ( Y,"^ ) ) , the sequence of processes. We prove 

- V V lntl Je<t<Tj n ^ 

that from any subsequence (Yj'( n 'J , we can extract a further subsequence 
(y/ (n) ) that weakly converges in B([e, 1], M ). 

V / n>l 

Let us first prove that (F/ (n) (e)) is bounded in L . 

V / n>l 

F/(")(e) 2 = [ ^P^E [p 2 (e,X mn)E] )] < esup (nE [ P 2 (e,X n )]) 
2 (p^nj n >x 



and the last term is bounded by lemma [ 

Consequently (y/ (n) (e)) > is tight. So there is a subsequence (y/ (n) (e)" 

that weakly converges in T e M to the distribution v e . Thanks to Skorohod theorem 
which allows to realize it as an a.s. convergence and to lemma 13.21 we can apply 
Theorem 11.2.3 of 20 j, and we obtain that the sequence of processes jy/'"') 

V / n>l 

weakly converges to a diffusion (yt)e<t<T with generator G$(t) given by (|2.13l) and 
such that y e has law v e . This achieves the proof of lemma I3~4l □ 



Proof of Theorem \2M Let Y n = K" fl . It is sufficient to prove that 

V 1 vo<t<r 

any subsequence of I Y n ) has a further subsequence which converges in law 
V J n>l 

to (ys(t))o<t<T- So let (y^™) J a subsequence. By lemma I3T41 with e = 1/m 

there exists a subsequence which converges in law on [1/m, T]. Then we extract a 
sequence indexed by m of subsequence and take the diagonal subsequence Y 11 ^. 
This subsequence converges in D((0,T],R d ) to {y'(t))te(o,T]- On the other hand, as 
in the proof of lemma 13. 4[ we have 

||y"(")(i)||2 < ct 

for some C > 0. So ||F" (n) (*)||2 ^ as i ^ 0, which in turn implies -> 

as t — > 0. The unicity statement in Proposition 12.51 implies that (y'(£))te(o,Tl and 
(ys(t))te(o,T] are equal in law. This achieves the proof. □ 
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